The following is a brief linguistic analysis of the use of racially charged language in William Faulkner’s Absalom, Absalom!. Faulkner’s representation of race was complicated, just as his own his relationship with race was complex. As a Southern white moderate, he voiced his anguish over the dehumanization of African Americans under Jim Crow segregation, and at the same time could also casually refer to people as “niggers” during the public retelling of a comic story. Indeed, there is no shortage of literature on Faulkner and race in general, and with regards to Absalom, Absalom! in particular. Given this extensive critical history, it almost goes without saying, that a computational analysis of word choice, especially with regard to racially charged language, cannot due justice to the complexities and nuances of either the text or Faulkner’s broader critical intervention. Nevertheless, using techniques common in corpus linguistics (CL) it is possible to give a birds-eye view of how the use of certain words is patterned, this pattern can then, in turn, inform subsequent close readings.
The following piece uses several techniques available to standard CL analysis, and one more complex analysis that is exclusively available to practitioners who have access to the Digital Yoknapatawpha data set. These different techniques have been split into different parts.
All of the data was generated using the R programming language using the tidyverse suite of packages. The full repository is available at https://github.com/joostburgers/absalom_sentiment_analysis Due to copyright issues the repository does not include the Absalom, Absalom text file used for data analysis.
With any textual analysis, some pre-processing is required. The steps that follow are standard procedures in CL. The text of Absalom, Absalom! was read in as a txt file. It was then broken into nine chapters, and further sub-setted into sentences. The individual words were subsequently “tokenized.” The process of tokenization removes capital letters, special characters, and punctuation. It enables the computer to compare words more easily. Each “stop word” was then removed. These are words like: the, a, on, at, etc. that are very frequent with in any text, and do not add to the analysis. The words were then lemmatized. Lemmatization reduces a word to the word stem. For example, Negroes becomes Negro. This way all instances of the concept “Negro” are unified as one instances. This prevents creating separate counts for words like Negro, Negroes, and Negro’s.
The resulting slate of words was tagged as either racially charged by adding a column called race_word and indicating TRUE or FALSE for each word. This was done by creating a list of racial words and joining it to the data table through a left sided join. Essentially, it checks to see any time a word like “Negro”, “Nigger”, or “Octoroon” occurs and tags it as TRUE. With this pre-processing complete it is possible to provide some key statistical insights.
The chart below shows the ten most frequent non-racial words and racial words in the text. Hovering over the the individual bars reveals their precise number, and clicking on TRUE and FALSE turns that particular series on and off.
What is immediately noticeable is that the word “nigger” is the most frequent racial term. It exceeds the word “negro” by 50 counts. It occurs about a third as infrequently as the word Henry (the main character) and twice as infrequently as the racially ambigious Charles Bon. Importantly, the occurrences of the individual names of characters is not the same as the number of times they actually occur in the text. After all, the pronouns “he” or “she” could equally well denote a character, but that is not shown here.
Collocation is a process of determining what words appear together. This is done through a process of creating n-grams, where n is the number of words that might match in a sequence. By determining the n-gram around particular words, we can get a better sense of the context. For example, her research of British Newspapers, Dawn Archer has shown that the most common bigram (n-gram of two) for Muslim is “Muslim terrorist”, certainly this strong association between these two words indicates how Muslim’s are represented in the British media.
The phrase that stands out the most is one that Rosa Coldfield uses early on “wild niggers.” It becomes a leitmotif for much of the text and the phrase will be repeated throughout. Yet, who repeats it and how it is repeated will change.
In their use of either “wild niggers” or “wild negro”, Quentin and Rosa Coldfield share an inverse relationship. This is curious because it is Rosa who first uses the phrase when referring to the demonic Sutpen arriving in Yoknapatawpha:
Out of quiet thunderclap he would abrupt (man-horse-demon) upon a scene peaceful and decorous as a schoolprize water color, faint sulphur-reek still in hair clothes and beard, with grouped behind him his band of wild niggers like beasts half tamed to walk upright like men, in attitudes wild and reposed, and manacled among them the French architect with his air grim, haggard, and taller-ran.
It is initial instance of the phrase uttered by Rosa that is carried forward throughout the text. Indeed, the enslaved people are continuously associated with the word “wild”, and it is Quentin whose narrative uses the racial epiteth more prominently than others.
We can also look at the word frequency data temporally by casting it across the chapters. This will indicate when a particular word is used often. It may be that some racial words are used in one part of the book and not in others. This gives some indication as to its value in the narrative.
It is clear that chapter 7 is particularly racially charged. While certain narrators predominate in certain chapters, it would be a mistake to attribute particular words to particular characters based on this raw data. We may recall that chapter 7 is a nested narration in which we are told the story of Thomas Sutpen as related it to General Compson to Quentin and finally to Shreve. There are several narrative frames that would make it very difficult to determine whose language this is. What is apparent is that the chapter in which most of Sutpen’s life is revealed is steeped in pejorative racist language. Indeed, in all the other chapters the word negro or black is used more frequently to describe African Americans.
Sentiment analysis is a field of CL that tries to establish the emotional valence of a segment of text. It does so through sentiment libraries. These are words that have been hand coded to indicate certain emotions like: joy, sadness, surprise, or, more broadly, positive and negative. In general, sentiment libraries are used for analyzing social media or large data sets where the narrative data tends to be less complex and operates at scale. Thus, while the sentiment dictionary might not match each sentiment exactly, in the aggregate the predominant emotion rises to the top.
For literary works, sentiment analysis is far more speculative and merits quite some caution. Without a specially trained dictionary for a specific corpus, sentiment analysis can reveal certain patterns around words but it is unclear what the margin of error might be. There are, so to speak, unknown unknowns. This is particularly true of Faulkner who uses many words that are emotionally charged that might not make their way into a sentiment library, or who uses words like “unamaze” to negate a particular emotion, in this case surprise. Any results that sentiment analysis generates should therefore be seen as a prompt into further inquiry and not a final result.
One of the most basic ways to think through sentiment are the positive and negative sentiments across a text. The basic procedure is to tag each positive and negative sentiment in a text and then tabulate these chunks by some logical unit, be it a sentence, paragraph, or chapter. This will give you the total sentiment of that particular unit. Since, we are interested in the emotion surrounding racial words, it makes the most sense to set the unit boundary at the sentence level. This produces a very granular chart, but for Absalom, Absalom! this granularity is very revealing.
One of the immediate things that stands out about this chart is just how negatively charged sentences in Absalom, Absalom! are. There are very few positive sentences in this text. The sentences that contain racial worlds are predominately negative. Indeed, sentence with the most negative emotions attached to it is racially charged. It is sentence 1421 which at 969 words is also one of the longest sentences in the text. This is the passage that speaks of Sutpen’s desolution in the wake of the Civil War and his drunken parleys with Wash Jones. The reason for the overabundance of negative emotions is both the sentence length and its grotesque content.
It is also possible to think through the sentiments attached to a particular word. This can be especially salient when considering the emotions around a character, a process that can be quite involved. One of the things we might want to know is that when Faulkner uses racial language what types of emotions do the surrounding words point to. We can map all of these emotions through a radar plot. A radar plot uses multiple axes and the force towards that axes to demonstrate multivariate differences. This is at best a conceptual depiction. Emotions do not work in opposites, and therefore a radar plot pointing strongly in one direction does not necessarily mean that it’s opposite is absent, or even opposite. After all, what is the opposite of surprise?